{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "<a id='HOME'></a>\n",
    "# CHAPTER 7 Mangle Data Like a Pro\n",
    "## 向高手一樣玩轉數據\n",
    "\n",
    "* [7.1.1 Unicode](#Unicode)\n",
    "* [7.1.2 格式化](#Format)\n",
    "* [7.1.3 正規表達式](#RegularExpressions)\n",
    "* [7.2.1 bytes and bytearray](#bytesbytearray)\n",
    "* [7.2.2 使用struct轉換二進位資料](#struct)\n",
    "* [7.2.3 其他二進位工具](#OtherTools)\n",
    "* [7.2.4 binascii函數](#binascii)\n",
    "* [7.2.5 位元運算](#BitOperators)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='Unicode'></a>\n",
    "## 7.1.1 Unicode\n",
    "[回目錄](#HOME)\n",
    "\n",
    "早期電腦發展時的所使用的ASCII只有128種，只能應付英文和數字以及一些基本的符號，所以發展出了Unicode來面對全世界所有的符號\n",
    "\n",
    "\\u 加上4碼16進位的數字表示Unicode 中的 256 個基本語言，前兩碼為類別，後兩碼為索引  \n",
    "\\U 加上8碼16進位的數字為表示超出上述範圍內的字符，最左一位須為0，\\N{name}用來指定字符名稱\n",
    "(完整清單 http://www.unicode.org/charts/charindex.html)\n",
    "\n",
    "python的unicodedata模組提供了下面兩個方向的轉換函數：\n",
    "\n",
    "* __lookup()__ - 接受不區分大小寫的標準名稱，返回一個Unicode的字符;\n",
    "* __name()__ - 接受一個的Unicode字符，返回大寫形式的名稱。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "value=\"A\", name=\"LATIN CAPITAL LETTER A\", value2=\"A\"\n",
      "value=\"$\", name=\"DOLLAR SIGN\", value2=\"$\"\n",
      "value=\"¢\", name=\"CENT SIGN\", value2=\"¢\"\n",
      "value=\"€\", name=\"EURO SIGN\", value2=\"€\"\n",
      "value=\"☃\", name=\"SNOWMAN\", value2=\"☃\"\n",
      "café\n",
      "value=\"é\", name=\"LATIN SMALL LETTER E WITH ACUTE\", value2=\"é\"\n",
      "b'\\\\xe9'\n",
      "value=\"é\", name=\"LATIN SMALL LETTER E WITH ACUTE\", value2=\"é\"\n"
     ]
    }
   ],
   "source": [
    "def unicode_test(value):\n",
    "    import unicodedata\n",
    "    name = unicodedata.name(value)\n",
    "    value2 = unicodedata.lookup(name)\n",
    "    print('value=\"%s\", name=\"%s\", value2=\"%s\"' % (value, name, value2))\n",
    "    \n",
    "unicode_test('A')\n",
    "unicode_test('$')\n",
    "unicode_test('\\u00a2')\n",
    "unicode_test('\\u20ac')\n",
    "unicode_test('\\u2603')\n",
    "\n",
    "\n",
    "# 想知道é這個符號的編碼\n",
    "place = 'café'\n",
    "print(place)\n",
    "unicode_test('é')\n",
    "print('é'.encode('unicode-escape'))   #這裡\n",
    "\n",
    "unicode_test('\\u00e9')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "使用utf-8進行編碼與解碼\n",
    "\n",
    "* 將字符串編碼為字節；\n",
    "* 將字節解碼為字符串。\n",
    "\n",
    "使用__encode()__來編碼字符串成我們看得懂的\n",
    "\n",
    "|編碼 | 說明 |\n",
    "|:---:|-----|\n",
    "|'ascii' | ASCII 編碼|\n",
    "|'utf-8' |最常用的編碼|\n",
    "|'latin-1' | ISO 8859-1 編碼|\n",
    "|'cp-1252' |Windows 常用編碼|\n",
    "|'unicode-escape' |Python 中 Unicode 的文本格式， \\uxxxx 或者 \\Uxxxxxxxx|"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "☃\n",
      "b'\\\\u2603'\n",
      "1\n",
      "3\n",
      "b'\\xe2\\x98\\x83'\n",
      "☃\n"
     ]
    }
   ],
   "source": [
    "#編碼\n",
    "snowman = '\\u2603'\n",
    "\n",
    "print(snowman)\n",
    "print(snowman.encode('unicode-escape'))\n",
    "\n",
    "\n",
    "print(len(snowman))\n",
    "ds = snowman.encode('utf-8')\n",
    "\n",
    "print(len(ds))\n",
    "print(ds)\n",
    "print('☃')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "café\n",
      "b'caf\\xc3\\xa9'\n",
      "café\n"
     ]
    }
   ],
   "source": [
    "#解碼\n",
    "\n",
    "place = 'caf\\u00e9'\n",
    "print(place)\n",
    "\n",
    "place_bytes = place.encode('utf-8')\n",
    "print(place_bytes)\n",
    "\n",
    "place2 = place_bytes.decode('utf-8')\n",
    "print(place2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='Format'></a>\n",
    "## 7.1.2 格式化\n",
    "[回目錄](#HOME)\n",
    "\n",
    "有兩種方法可以格式化文字\n",
    "* __string % data__\n",
    "* __{}.format__  (新的寫法，推薦使用)\n",
    "* __f\"{Variable Name}\"__ (Python 3.6之後才有)\n",
    "\n",
    "第一種搭配的format符號表如下\n",
    "\n",
    "|符號|種類|\n",
    "|---|---|\n",
    "|%s | 字串|\n",
    "|%d | 十進制整數|\n",
    "|%x | 十六進制整數|\n",
    "|%o | 八進制整數|\n",
    "|%f | 十進制浮點數|\n",
    "|%e | 以科學計數法表示的浮點數|\n",
    "|%g | 十進製或科學計數法表示的浮點數|\n",
    "|%% | 文本值 % 本身|"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "42\n",
      "42\n",
      "2a\n",
      "52\n",
      "7.03\n",
      "7.030000\n",
      "7.030000e+00\n",
      "7.03\n",
      "100%\n",
      "混合搭配文字[我是文字]，以及數字[87.000000]\n",
      "        42\n",
      "      0042\n",
      "      42.0\n",
      "42.0\n",
      "42        \n",
      "42.0      \n"
     ]
    }
   ],
   "source": [
    "# 方法一\n",
    "\n",
    "print('%s' % 42)\n",
    "print('%d' % 42)\n",
    "print('%x' % 42)\n",
    "print('%o' % 42)\n",
    "print('%s' % 7.03)\n",
    "print('%f' % 7.03)\n",
    "print('%e' % 7.03)\n",
    "print('%g' % 7.03)\n",
    "print('%d%%' % 100)\n",
    "print('混合搭配文字[%s]，以及數字[%f]' % ('我是文字',87))\n",
    "\n",
    "#可搭配數字做位數控制\n",
    "print('%10d' % 42)\n",
    "print('%10.4d' % 42)\n",
    "print('%10.1f' % 42)\n",
    "print('%.1f' % 42)\n",
    "\n",
    "print('%-10d' % 42)\n",
    "print('%-10.1f' % 42)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "42 7.03 string cheese\n",
      "string cheese 42 7.03\n",
      "42 7.03 string cheese\n",
      "42 7.03 string cheese other\n",
      "===========分隔線===========\n",
      "42 7.030000 string cheese\n",
      "42 7.030000 string cheese\n",
      "===========分隔線===========\n",
      "        42   7.030000 string cheese\n",
      "        42   7.030000 string cheese\n",
      "42         7.030000   string cheese\n",
      "    42      7.030000  string cheese\n",
      "0000000042     7.0300       stri\n",
      "!!!!!!BIG SALE!!!!!!\n"
     ]
    }
   ],
   "source": [
    "# 方法二\n",
    "\n",
    "n = 42\n",
    "f = 7.03\n",
    "s = 'string cheese'\n",
    "\n",
    "print('{} {} {}'.format(n, f, s))\n",
    "print('{2} {0} {1}'.format(n, f, s))\n",
    "print('{n} {f} {s}'.format(n=42, f=7.03, s='string cheese'))\n",
    "\n",
    "# 使用字典傳入\n",
    "d = {'n': 42, 'f': 7.03, 's': 'string cheese'}\n",
    "print('{0[n]} {0[f]} {0[s]} {1}'.format(d, 'other')) #0表示format的第一個參數，1表示第二個參數\n",
    "\n",
    "# 方法一中的format也可以用在新方法，採用:來做銜接\n",
    "print('===========分隔線===========')\n",
    "print('{0:d} {1:f} {2:s}'.format(n, f, s))\n",
    "print('{n:d} {f:f} {s:s}'.format(n=42, f=7.03, s='string cheese'))\n",
    "print('===========分隔線===========')\n",
    "print('{0:10d} {1:10f} {2:10s}'.format(n, f, s))        #指定寬度\n",
    "print('{0:>10d} {1:>10f} {2:>10s}'.format(n, f, s))     #右對齊\n",
    "print('{0:<10d} {1:<10f} {2:<10s}'.format(n, f, s))     #左對齊\n",
    "print('{0:^10d} {1:^10f} {2:^10s}'.format(n, f, s))     #置中對齊\n",
    "print('{0:>010d} {1:>10.4f} {2:>10.4s}'.format(n, f, s)) #與舊方法不同，整數沒有經度設定項\n",
    "print('{0:!^20s}'.format('BIG SALE'))                   #指定填充符號"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='RegularExpressions'></a>\n",
    "## 7.1.3 正規表達式\n",
    "[回目錄](#HOME)\n",
    "\n",
    "\n",
    "採用wiki的說法\n",
    "\n",
    "正規表示式，又稱正則表達式、正規表示法、正規運算式、規則運算式、常規表示法（英語：Regular Expression，在代碼中常簡寫為regex、regexp或RE），  \n",
    "電腦科學的一個概念。正規表示式使用單個字串來描述、符合一系列符合某個句法規則的字串。  \n",
    "在很多文字編輯器裡，正則運算式通常被用來檢索、取代那些符合某個模式的文字。\n",
    "\n",
    "簡單來說，就是可以用來匹配__字串(source)__中的__規則(pattern)__\n",
    "\n",
    "```python\n",
    "import re  #從標準函式庫引入\n",
    "```\n",
    "\n",
    "|function  | 功能  |\n",
    "|----------|------|\n",
    "|re.match( pattern, source )   | 查看字串是否以規定的規則開頭        |\n",
    "|re.search( pattern, source )  | 會返回第一次成功的匹配值 (如果有成功)  |\n",
    "|re.findall( pattern, source) | 會返回所有成功且不重複的匹配值 (如果有成功)  |\n",
    "|re.split( pattern, source )   | 會根據 規則 將 字串 切分成若干段，返回由這些片段組成的list |\n",
    "|re.sub( pattern, replacement, source )     | 還需一個額外的參數 replacement，它會把 字串 中所有匹配規則的字串 替換成 replacement|"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "----------match----------\n",
      "You\n",
      "\n",
      "----------compile後match----------\n",
      "<_sre.SRE_Match object; span=(0, 3), match='You'>\n",
      "You\n",
      "\n",
      "----------match使用.*找任何位置----------\n",
      "Young Frank\n",
      "\n",
      "----------search----------\n",
      "Frank\n",
      "\n",
      "----------findall----------\n",
      "['n', 'n', 'n', 'n']\n",
      "共找到 4 筆符合值\n",
      "\n",
      "['ng ', 'nke', 'nst']\n",
      "['ng', 'nk', 'ns', 'n']\n",
      "\n",
      "----------split----------\n",
      "['You', 'g Fra', 'ke', 'stei', '']\n",
      "\n",
      "----------sub----------\n",
      "You?g Fra?ke?stei?\n",
      "['Fra']\n"
     ]
    }
   ],
   "source": [
    "import re\n",
    "# .group()可以叫出符合正規表達式的字串部分\n",
    "\n",
    "print('----------match----------')\n",
    "# 檢查'Young Frankenstein'是否以'You'開頭\n",
    "result = re.match('You', 'Young Frankenstein')\n",
    "if result:\n",
    "    print(result.group())\n",
    "\n",
    "\n",
    "print('\\n----------compile後match----------')\n",
    "# 針對較複雜情況可以先編譯一個物件出來加速判斷\n",
    "youpattern = re.compile('You')\n",
    "result = youpattern.match('Young Frankenstein')\n",
    "print(result)\n",
    "if result:\n",
    "    print(result.group())\n",
    "\n",
    "print('\\n----------match使用.*找任何位置----------')\n",
    "# \".\"為除「\\n」之外的任何單個字元。  \"*\"為符合前面的子運算式零次或多次。\n",
    "# 組合在一起則成為匹配任意長度任意字元(除「\\n」)的規則\n",
    "m = re.match('.*Frank', 'Young Frankenstein')\n",
    "if m:\n",
    "    print(m.group())\n",
    "\n",
    "print('\\n----------search----------')\n",
    "# 可以不用透過\".*\"來找任意位置的符合值\n",
    "m = re.search('Frank', 'Young Frankenstein')\n",
    "if m: # search返回物件\n",
    "    print(m.group())\n",
    "    \n",
    "print('\\n----------findall----------')\n",
    "# 尋找所有符合的\n",
    "m = re.findall('n', 'Young Frankenstein')\n",
    "print(m) # findall返回了一个列表\n",
    "print('共找到', len(m), '筆符合值\\n')\n",
    "\n",
    "#尋找後方有一個字元的\n",
    "m = re.findall('n..', 'Young Frankenstein')\n",
    "print(m) # findall返回了一个列表\n",
    "\n",
    "#尋找後方有一個字元(可以沒有)的\n",
    "m = re.findall('n.?', 'Young Frankenstein')\n",
    "print(m) # findall返回了一个列表\n",
    "\n",
    "print('\\n----------split----------')\n",
    "# 利用規格做切割字串\n",
    "m = re.split('n', 'Young Frankenstein')\n",
    "print(m) # split返回了一个列表\n",
    "\n",
    "print('\\n----------sub----------')\n",
    "# 利用規格做替換字串\n",
    "m = re.sub('n', '?', 'Young Frankenstein')\n",
    "print(m) # sub返回了一个列表\n",
    "\n",
    "\n",
    "#尋找英文單字邊界\n",
    "m = re.findall(r'\\bFra', 'Young Frankenstein')\n",
    "print(m) # findall返回了一个列表"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "|特殊字元|功能|\n",
    "|:---:|--------|\n",
    "|.    |代表任意除 \\n 外的字元|\n",
    "|\\*   |表示任意多個字元(包括 0 個)|\n",
    "|?    |表示可選字元( 0 個或 1 個)|\n",
    "|\\d   |一個數字字元。等價於[0-9]|\n",
    "|\\D   |一個非數字字元。等價於[^0-9]|\n",
    "|\\w   |一個 字母 或 數字 包括底線字元。等價於[A-Za-z0-9\\_]|\n",
    "|\\W   |一個 非字母 非數字 非底線字元。等價於[^A-Za-z0-9\\_]|\n",
    "|\\s   |空白字元。等價於[ \\f\\n\\r\\t\\v]|\n",
    "|\\S   |非空白字元。等價於[^ \\f\\n\\r\\t\\v]|\n",
    "|\\b   |單詞邊界（一個 \\w 與 \\W 之間的範圍，順序可逆）|\n",
    "|\\B   |非單詞邊界|"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN\n",
      "OPQRSTUVWXYZ!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n",
      "\r",
      "\u000b",
      "\f",
      "\n",
      "['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']\n",
      "['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_']\n",
      "[' ', '\\t', '\\n', '\\r', '\\x0b', '\\x0c']\n"
     ]
    }
   ],
   "source": [
    "import string\n",
    "printable = string.printable  #100個ASCII字元\n",
    "len(printable)\n",
    "\n",
    "print(printable[0:50])\n",
    "print(printable[50:])\n",
    "\n",
    "print(re.findall('\\d', printable))  #找數字\n",
    "print(re.findall('\\w', printable))  #找字母與數字\n",
    "print(re.findall('\\s', printable))  #找空白\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "直線符號(|)在markdown中的表格會變成區分格子用，我打不出來....請各位使用時自行替換\n",
    "\n",
    "|規則\t\t\t\t|功能                                  |\n",
    "|-------------------|--------------------------------------|\n",
    "|abc\t\t\t\t|文本值 abc                            |\n",
    "|(expr)\t\t\t\t|expr                                  |\n",
    "|expr1 直線符號 expr2\t\t|expr1 或 expr2                        |\n",
    "|. \t\t\t\t\t|除 \\n 外的任何字元                    |\n",
    "|^\t\t\t\t\t|源字元串的開頭                        |\n",
    "|$\t\t\t\t\t|源字元串的結尾                        |\n",
    "|prev?\t\t\t\t|0 個或 1 個 prev                      |\n",
    "|prev*\t\t\t\t|0 個或多個 prev，盡可能多地匹配       |\n",
    "|prev*?\t\t\t\t|0 個或多個 prev，盡可能少地匹配       |\n",
    "|prev+\t\t\t\t|1 個或多個 prev，盡可能多地匹配       |\n",
    "|prev+?\t\t\t\t|1 個或多個 prev，盡可能少地匹配       |\n",
    "|prev{m}\t\t\t|m 個連續的 prev                       |\n",
    "|prev{m, n}\t\t\t|m 到 n 個連續的 prev，盡可能多地匹配  |\n",
    "|prev{m, n}?\t\t|m 到 n 個連續的 prev，盡可能少地匹配  |\n",
    "|[abc]\t\t\t\t|a 或 b 或 c(和 a直線符號b直線符號c 一樣)            |\n",
    "|[^abc]\t\t\t\t|非(a 或 b 或 c)                       |\n",
    "|prev (?=next)\t\t|如果後面為 next，返回 prev            |\n",
    "|prev (?!next)\t\t|如果後面非 next，返回 prev            |\n",
    "|(?<=prev) next\t\t|如果前面為 prev，返回 next            |\n",
    "|(?<!prev) next\t\t|如果前面非 prev，返回 next            |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1. ['wish', 'wish']\n",
      "2. ['wish', 'wish', 'fish']\n",
      "3. []\n",
      "4. ['I wish']\n",
      "5. []\n",
      "6. ['fish tonight。']\n",
      "7. []\n",
      "8. ['wish', 'wish', 'fish']\n",
      "9. ['w', 'sh', 'w', 'sh', 'h', 'sh', 'sh', 'h']\n",
      "10. ['ght\\n', 'ght。']\n",
      "11. ['I ', 'I ']\n",
      "12. [' wish', ' wish']\n",
      "13. []\n",
      "14. ['fish']\n",
      "\n",
      "--------------------\n",
      "a dish of fish\n",
      "('a dish', 'fish')\n",
      "a dish of fish\n",
      "('a dish', 'fish')\n",
      "a dish\n",
      "fish\n"
     ]
    }
   ],
   "source": [
    "source = '''I wish I may, I wish I might\n",
    "... Have a dish of fish tonight。'''\n",
    "\n",
    "# 1. 找wish\n",
    "print(\"1.\", re.findall('wish', source))\n",
    "\n",
    "# 2. 找wish或fish\n",
    "print(\"2.\", re.findall('wish|fish', source))\n",
    "\n",
    "# 3. 找wish開頭\n",
    "print(\"3.\", re.findall('^wish', source))\n",
    "\n",
    "# 4. 找I wish開頭\n",
    "print(\"4.\", re.findall('^I wish', source))\n",
    "\n",
    "# 5. 找fish結束\n",
    "print(\"5.\", re.findall('fish$', source))\n",
    "\n",
    "# 6. 找fish tonight(後面可以有無一個字元)\n",
    "print(\"6.\", re.findall('fish tonight.$', source))\n",
    "\n",
    "# 7. 找fish tonight.(使用跳脫符號，表示\\.為一個點而不是萬用字元)\n",
    "print(\"7.\", re.findall('fish tonight\\.$', source))\n",
    "\n",
    "# 8. 找wish與fish\n",
    "print(\"8.\", re.findall('[wf]ish', source))\n",
    "\n",
    "# 9. 找w、s、h組合出來的字串\n",
    "print(\"9.\", re.findall('[wsh]+', source))\n",
    "\n",
    "# 10. 找ght開頭，後面接著非字母 非數字 非底線字元\n",
    "print(\"10.\", re.findall('ght\\W', source))\n",
    "\n",
    "# 11. 找I開頭，後面是wish，但只返回前面\n",
    "print(\"11.\", re.findall('I (?=wish)', source))\n",
    "\n",
    "# 12. 找前面開頭是I的wish，指返回後面\n",
    "print(\"12.\", re.findall('(?<=I) wish', source))\n",
    "\n",
    "# 13. 原定希望找到fish然後前面是單詞邊界的地方，但是\\b被當作是跳脫字元返回符號了\n",
    "print(\"13.\", re.findall('\\bfish', source))\n",
    "\n",
    "# 14. 所以採用r來宣告說我這是一個原始的字串，不需要自動轉換\n",
    "print(\"14.\", re.findall(r'\\bfish', source))\n",
    "\n",
    "\n",
    "print('\\n--------------------')\n",
    "#用括號筆規則做區分後可以透過groups()取得分開的tuple，並且可以透過<name>設定名稱\n",
    "m = re.search(r'(. dish\\b).*(\\bfish)', source)\n",
    "print(m.group())\n",
    "print(m.groups())\n",
    "\n",
    "\n",
    "m = re.search(r'(?P<DISH>. dish\\b).*(?P<FISH>\\bfish)', source)\n",
    "print(m.group())\n",
    "print(m.groups())\n",
    "\n",
    "print(m.group('DISH'))\n",
    "print(m.group('FISH'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='bytesbytearray'></a>\n",
    "## 7.2.1 bytes and bytearray\n",
    "[回目錄](#HOME)\n",
    "\n",
    "恩...就是介紹bytes 和 bytearray的差別，一個可變一個不可變\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "b'\\x01\\x02\\x03\\xff'\n",
      "bytearray(b'\\x01\\x02\\x03\\xff')\n",
      "bytearray(b'\\x01\\x7f\\x03\\xff')\n"
     ]
    }
   ],
   "source": [
    "blist = [1, 2, 3, 255]\n",
    "the_bytes = bytes(blist)\n",
    "print(the_bytes)\n",
    "\n",
    "the_byte_array = bytearray(blist)\n",
    "print(the_byte_array)\n",
    "\n",
    "the_byte_array[1] = 127 #可變\n",
    "print(the_byte_array)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='struct'></a>\n",
    "## 7.2.2 使用struct轉換二進位資料\n",
    "[回目錄](#HOME)\n",
    "\n",
    "使用標準函式庫裡的struct來做為轉換二進位資料\n",
    "\n",
    "|符號 | Byte order |\n",
    "|----|--------|\n",
    "|<| 小端方案|\n",
    "|>| 大端方案|\n",
    "\n",
    "|標識符|\t描述|\t字節|\n",
    "|:--:|-----|:----:|\n",
    "|x\t|跳過一個字節|\t1|\n",
    "|b\t|有符號字節|\t1|\n",
    "|B\t|無符號字節|\t1|\n",
    "|h\t|有符號短整數|\t2|\n",
    "|H\t|無符號短整數|\t2|\n",
    "|i\t|有符號整數|\t4|\n",
    "|I\t|無符號整數|\t4|\n",
    "|l\t|有符號長整數|\t4|\n",
    "|L\t|無符號長整數|\t4|\n",
    "|Q\t|無符號 long long 型整數|\t8|\n",
    "|f\t|單精度浮點數|\t4\t|\t\t\n",
    "|d\t|雙精度浮點數|\t8\t\t|\t\n",
    "|p\t|數量和字符|\t1\t+\t數量\t|\n",
    "|s\t|字符|\t數量|\t\t\t\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Valid PNG, width 154 height 141\n",
      "b'\\x00\\x00\\x00\\x9a'\n",
      "b'\\x00\\x00\\x00\\x8d'\n",
      "(154, 141)\n",
      "(154, 141)\n"
     ]
    }
   ],
   "source": [
    "import struct\n",
    "valid_png_header = b'\\x89PNG\\r\\n\\x1a\\n'   #png的檔頭\n",
    "data = b'\\x89PNG\\r\\n\\x1a\\n\\x00\\x00\\x00\\rIHDR' + \\\n",
    "    b'\\x00\\x00\\x00\\x9a\\x00\\x00\\x00\\x8d\\x08\\x02\\x00\\x00\\x00\\xc0'   #一個圖檔的前段\n",
    "    \n",
    "if data[:8] == valid_png_header:\n",
    "    width, height = struct.unpack('>LL', data[16:24])\n",
    "    print('Valid PNG, width', width, 'height', height)\n",
    "else:\n",
    "    print('Not a valid PNG')\n",
    "    \n",
    "#反過來轉換\n",
    "print(struct.pack('>L', 154))\n",
    "print(struct.pack('>L', 141))\n",
    "\n",
    "print(struct.unpack('>2L', data[16:24]))\n",
    "print(struct.unpack('>16x2L6x', data))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='OtherTools'></a>\n",
    "## 7.2.3 其他二進位工具\n",
    "[回目錄](#HOME)\n",
    "\n",
    "* bitstring（ https://code.google.com/p/python-bitstring/ ）\n",
    "* construct（ http://construct.readthedocs.org/en/latest/ ）\n",
    "* hachoir（ https://bitbucket.org/haypo/hachoir/wiki/Home ）\n",
    "* binio（ http://spika.net/py/binio/ ）\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='binascii'></a>\n",
    "## 7.2.4 binascii函數\n",
    "[回目錄](#HOME)\n",
    "\n",
    "十六進制、六十四進制、uuencoded，等等之間轉換的函數。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "b'89504e470d0a1a0a'\n",
      "b'\\x89PNG\\r\\n\\x1a\\n'\n"
     ]
    }
   ],
   "source": [
    "import binascii\n",
    "\n",
    "#八字節轉十六bytes\n",
    "valid_png_header = b'\\x89PNG\\r\\n\\x1a\\n'\n",
    "print(binascii.hexlify(valid_png_header))\n",
    "\n",
    "#十六bytes轉八bytes\n",
    "print(binascii.unhexlify(b'89504e470d0a1a0a'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "<a id='BitOperators'></a>\n",
    "## 7.2.5 位元運算\n",
    "[回目錄](#HOME)\n",
    "\n",
    "以整數 a（十進制 5，二進制 0b0101）和 b（十進制 1，二進制 0b0001）做示範\n",
    "\n",
    "<table>\n",
    "  <tr>\n",
    "    <th>運算符</th>\n",
    "    <th>描述</th>\n",
    "    <th>示範</th>\n",
    "    <th>十進制結果</th>\n",
    "    <th>二進制結果</th>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>&</td>\n",
    "    <td>與</td>\n",
    "    <td>a & b</td>\n",
    "    <td>1</td>\n",
    "    <td>0b0001</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>|</td>\n",
    "    <td>或</td>\n",
    "    <td>a | b</td>\n",
    "    <td>5</td>\n",
    "    <td>0b0101</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>^</td>\n",
    "    <td>異或</td>\n",
    "    <td> a ^ b</td>\n",
    "    <td>4</td>\n",
    "    <td>0b0100</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>~</td>\n",
    "    <td>翻轉</td>\n",
    "    <td>~a</td>\n",
    "    <td>-6</td>\n",
    "    <td>取決於 int 類型的大小</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td><<</td>\n",
    "    <td>左位移</td>\n",
    "    <td>a << 1</td>\n",
    "    <td>10</td>\n",
    "    <td>0b1010</td>\n",
    "  </tr>\n",
    "  <tr>\n",
    "    <td>>></td>\n",
    "    <td>右位移</td>\n",
    "    <td>a >> 1</td>\n",
    "    <td>2</td>\n",
    "    <td>0b0010</td>\n",
    "  </tr>\n",
    "</table>\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [Root]",
   "language": "python",
   "name": "Python [Root]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
